191 research outputs found

    Kalign – an accurate and fast multiple sequence alignment algorithm

    Get PDF
    BACKGROUND: The alignment of multiple protein sequences is a fundamental step in the analysis of biological data. It has traditionally been applied to analyzing protein families for conserved motifs, phylogeny, structural properties, and to improve sensitivity in homology searching. The availability of complete genome sequences has increased the demands on multiple sequence alignment (MSA) programs. Current MSA methods suffer from being either too inaccurate or too computationally expensive to be applied effectively in large-scale comparative genomics. RESULTS: We developed Kalign, a method employing the Wu-Manber string-matching algorithm, to improve both the accuracy and speed of multiple sequence alignment. We compared the speed and accuracy of Kalign to other popular methods using Balibase, Prefab, and a new large test set. Kalign was as accurate as the best other methods on small alignments, but significantly more accurate when aligning large and distantly related sets of sequences. In our comparisons, Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods. CONCLUSION: Kalign is a fast and robust alignment method. It is especially well suited for the increasingly important task of aligning large numbers of sequences

    RECLU:a pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE)

    Get PDF
    BACKGROUND: Next generation sequencing based technologies are being extensively used to study transcriptomes. Among these, cap analysis of gene expression (CAGE) is specialized in detecting the most 5’ ends of RNA molecules. After mapping the sequenced reads back to a reference genome CAGE data highlights the transcriptional start sites (TSSs) and their usage at a single nucleotide resolution. RESULTS: We propose a pipeline to group the single nucleotide TSS into larger reproducible peaks and compare their usage across biological states. Importantly, our pipeline discovers broad peaks as well as the fine structure of individual transcriptional start sites embedded within them. We assess the performance of our approach on a large CAGE datasets including 156 primary cell types and two cell lines with biological replicas. We demonstrate that genes have complicated structures of transcription initiation events. In particular, we discover that narrow peaks embedded in broader regions of transcriptional activity can be differentially used even if the larger region is not. CONCLUSIONS: By examining the reproducible fine scaled organization of TSS we can detect many differentially regulated peaks undetected by previous approaches

    The human PINK1 locus is regulated in vivo by a non-coding natural antisense RNA during modulation of mitochondrial function

    Get PDF
    BACKGROUND: Mutations in the PTEN induced putative kinase 1 (PINK1) are implicated in early-onset Parkinson's disease. PINK1 is expressed abundantly in mitochondria rich tissues, such as skeletal muscle, where it plays a critical role determining mitochondrial structural integrity in Drosophila. RESULTS: Herein we characterize a novel splice variant of PINK1 (svPINK1) that is homologous to the C-terminus regulatory domain of the protein kinase. Naturally occurring non-coding antisense provides sophisticated mechanisms for diversifying genomes and we describe a human specific non-coding antisense expressed at the PINK1 locus (naPINK1). We further demonstrate that PINK1 varies in vivo when human skeletal muscle mitochondrial content is enhanced, supporting the idea that PINK1 has a physiological role in mitochondrion. The observation of concordant regulation of svPINK1 and naPINK1 during in vivo mitochondrial biogenesis was confirmed using RNAi, where selective targeting of naPINK1 results in loss of the PINK1 splice variant in neuronal cell lines. CONCLUSION: Our data presents the first direct observation that a mammalian non-coding antisense molecule can positively influence the abundance of a cis-transcribed mRNA under physiological abundance conditions. While our analysis implies a possible human specific and dsRNA-mediated mechanism for stabilizing the expression of svPINK1, it also points to a broader genomic strategy for regulating a human disease locus and increases the complexity through which alterations in the regulation of the PINK1 locus could occur

    Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features

    Get PDF
    In the growing field of genomics, multiple alignment programs are confronted with ever increasing amounts of data. To address this growing issue we have dramatically improved the running time and memory requirement of Kalign, while maintaining its high alignment accuracy. Kalign version 2 also supports nucleotide alignment, and a newly introduced extension allows for external sequence annotation to be included into the alignment procedure. We demonstrate that Kalign2 is exceptionally fast and memory-efficient, permitting accurate alignment of very large numbers of sequences. The accuracy of Kalign2 compares well to the best methods in the case of protein alignments while its accuracy on nucleotide alignments is generally superior. In addition, we demonstrate the potential of using known or predicted sequence annotation to improve the alignment accuracy. Kalign2 is freely available for download from the Kalign web site (http://msa.sbc.su.se/)

    Conserved temporal ordering of promoter activation implicates common mechanisms governing the immediate early response across cell types and stimuli

    Get PDF
    Conserved temporal precedence between IEGs (light blue nodes) and other protein-coding genes (green nodes) is shown by directed edges. Genes annotated with the GO term 'response to endoplasmic reticulum stress' (GO:003497) have a red rectangle around the gene name; red squares indicate genes with CAGE clusters enriched for XBP1 transcription factor binding sites

    Pfam: clans, web tools and services

    Get PDF
    Pfam is a database of protein families that currently contains 7973 entries (release 18.0). A recent development in Pfam has enabled the grouping of related families into clans. Pfam clans are described in detail, together with the new associated web pages. Improvements to the range of Pfam web tools and the first set of Pfam web services that allow programmatic access to the database and associated tools are also presented. Pfam is available on the web in the UK (http://www.sanger.ac.uk/Software/Pfam/), the USA (http://pfam.wustl.edu/), France (http://pfam.jouy.inra.fr/) and Sweden (http://pfam.cgb.ki.se/)

    Searching for a technology-driven acute rheumatic fever test: The START study protocol

    Get PDF
    Introduction: The absence of a diagnostic test for acute rheumatic fever (ARF) is a major impediment in managing this serious childhood condition. ARF is an autoimmune condition triggered by infection with group A Streptococcus. It is the precursor to rheumatic heart disease (RHD), a leading cause of health inequity and premature mortality for Indigenous peoples of Australia, New Zealand and internationally. Methods and analysis: Searching for a Technology-Driven Acute Rheumatic Fever Test\u27 (START) is a biomarker discovery study that aims to detect and test a biomarker signature that distinguishes ARF cases from non-ARF, and use systems biology and serology to better understand ARF pathogenesis. Eligible participants with ARF diagnosed by an expert clinical panel according to the 2015 Revised Jones Criteria, aged 5-30 years, will be recruited from three hospitals in Australia and New Zealand. Age, sex and ethnicity-matched individuals who are healthy or have non-ARF acute diagnoses or RHD, will be recruited as controls. In the discovery cohort, blood samples collected at baseline, and during convalescence in a subset, will be interrogated by comprehensive profiling to generate possible diagnostic biomarker signatures. A biomarker validation cohort will subsequently be used to test promising combinations of biomarkers. By defining the first biomarker signatures able to discriminate between ARF and other clinical conditions, the START study has the potential to transform the approach to ARF diagnosis and RHD prevention. Ethics and dissemination: The study has approval from the Northern Territory Department of Health and Menzies School of Health Research ethics committee and the New Zealand Health and Disability Ethics Committee. It will be conducted according to ethical standards for research involving Indigenous Australians and New Zealand Mā ori and Pacific Peoples. Indigenous investigators and governance groups will provide oversight of study processes and advise on cultural matters
    corecore